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ABSTRACT 
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the educational ecology of classrooms. Using the MTMM approach, the study 
focuses on the extent to which multiple methods of inquiry (surveys, 
drawings, and videos) are valid indicators of classroom teaching and learning 
experiences. Two broad classrooms traits were examined: the mode of 
instruction and the variety of learning materials used in the classroom. The 
samples consisted of 20 classrooms in grades 3 to 6 in 3 elementary school 
systems. Assessments included 30 to 45 minutes of video taping in the 
classroom during instruction, followed by closed-ended surveys and student 
drawings of teaching and learning. Constrained emergent coding was used to 
score student drawings and classroom videos, with three raters for drawings 
and three raters for videos . Raters established a high degree of consistency 
in the coding. Findings indicate that student drawings and classroom videos 
demonstrate strong evidence of convergent and discriminant validity 
(construct validity) in assessing mode of instruction and variety of learning 
materials used in the classrooms. Surveys demonstrated strong evidence of 
discriminant validity, but revealed mixed results for convergent validity. 
Results provide new insights into the use of drawings and videos as 
assessment tools for the educational ecology of classrooms. The drawing and 
video scoring guides are attached. (Contains 11 tables and 46 references.) 
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USING MULTIPLE MEANS OF INQUIRY TO GAIN INSIGHT INTO CLASSROOMS: 
A MULTI-TRAIT MULTI-METHOD APPROACH 



T I 



BY 

CengizGulek 

ABSTRACT 

Since the release of A Nation At Risk: The Imperative for Educational Reform in 1983 (The National 
Commission on Excellence in Education), numerous educators have proposed reforms for education in the US. A 
common belief in current school reform efforts is that teachers within schools must become reflective practitioners 
if they are to be more successful in meeting the needs of increasingly diverse student populations (Schon, 1987, 

1991; Sternberg & Hoi-vath, 1995). In an effort to help promote reflection about instructional improvement, this 
study explored the possibility of utilizing Campbell and Fiske’s (1959) Multi-Trait Multi-Method approach in order 
to examine the educational ecology of classrooms. Using the MTMM approach, this study primarily focused on the 
extent to which multiple methods of inquiry (surveys, drawings, and videos) are valid indicators of classroom 
teaching and learning experiences. Aspects of teaching and learning examined include two broad classroom traits; 
the mode of instruction and the variety of learning materials used in the classroom. 

Twenty classrooms in three different elementary school systems constituted the sample (Grades 3 to 6). 
Assessments included 30-45 minutes of video taping of the classroom during instruction, followed by closed-ended 
surveys and student drawings of teaching and learning. Constrained emergent coding was utilized to score student 
drawings and classroom videos. Three raters for drawings and three raters for videos were employed for scoring. 
Raters established a high degree of consistency in coding drawings and videos. 

Findings indicated that student drawings and classrooms videos demonstrated a strong evidence for 
convergent validity and discriminant validity (i.e. construct validity) in assessing mode of instruction and variety of 
learning materials used in classroom. Although surveys demonstrated strong evidence for discriminant validity, they 
revealed mixed results for convergent validity. 

Results provided new insights into the utilization of drawings and videos as unique assessment tools to 
document the educational ecology of classrooms. Drawings may constitute a cost-effective alternative to video 
taping when school practitioners have limited resources. The discussion includes implications of findings for 
education, limitations of the present study, and prospects for future research. 
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Introduction 



Since the release of A Nation at Risk: The Imperative for Educational Reform in 1983 (The National 
Commission on Excellence in Education), numerous educators have proposed reforms for education in the 
United States. A variety of reports emphasized the importance of reform in areas of education ranging from 
teacher training to professional development, from educational assessment to school governance. Reform 
efforts concerning education were subsequently initiated at national, state and district levels. Haney, et. al. 
(1997) highlighted several themes as the cornerstone of the educational reform movement. First, teachers 
must become reflective practitioners in order to more successfully meet the needs of the increasingly diverse 
student population (Schon, 1987, 1991 and Sternberg & Horvath, 1995). Second, student outcome measures 
should become the cornerstone forjudging the success of reform efforts. Third, educational research and 
assessment should be the driving force in improving educational practices. Finally, utilizing alternative 
means of assessing the educational ecology of classrooms are necessary. Two examples of such alternative 
modes of assessment as suggested by the present study are drawings of teaching and learning as projective 
measures and classroom video recordings, as observations. 

Efforts to judge the success of reform efforts in schools are moving toward utilizing multiple 
measurements of student progress. For . instance, the San Diego City Schools use six indicators of student 
achievement for their Accountability System: Norm-Referenced Tests, Portfolios, Report Card Grades, On- 
Demand Performance Assessment, English Language Development Assessment, and Course Enrollment 
(SDCS Annual Report 1995-96). However, educational research accounts of utilizing different research 
methods tend to be limited to the analysis of single measure design. A simple way of illustrating this general 
pattern is conducting a search in the Educational Resources Information Clearinghouse (ERIC) holdings. 

The “ERIC is a national information system established in 1966 by the federal government to 
provide ready access to educational literature by and for educational practitioners and scholars... After three 
decades, the ERIC system has indexed close to 1 million documents” (Haney, 1997, p.7). One of the features 
of ERIC system is to include keywords or "descriptors” that are designed to help identify the major and 
minor topics treated in the documents. These controlled vocabulary descriptors are described in the 
Thesaurus of ERIC Descriptors (Houston, 1995). As Haney (1997) noted “In the last thirty years, EWC has 
grown into the largest and best-indexed database on educational research in the US, if not the world” (p.8). 
Using the descriptors indexed in ERIC, an on-line search of the ERIC Database suggests that the broad 
educational literature has devoted a great deal of attention to “questionnaires” as a method of inquiry 
followed by “observation”(see Table 1). “Projective Measures” which also include “children’s art” and 
“freehand drawings” seem to be much less frequently used in the educational literature indexed in ERIC. 



Table 1: On-Line Search of ERIC Database Concerning Different Research Methods. 



Descriptor 


1966-1981 


1982-1991 


1992- 

Sept.1997 


Total Number of 
Entries 

(1966-Scpt.l997) 


Questionnaires 


15,352 


10,708 


7.012 


33,072 


Observation 


6,256 


5,210 


2,904 


14,370 


Projective Measures 


230 


97 


34 


361 


Questionnaires and Observation 


533 


401 


269 


1,203 


Questionnaires and Projective Measures 


8 


8 


2 


IS 


Observation and Projective Measures 


7 


3 


0 


10 


Questionnaires 8l Observation & Projective Measures 


0 


0 


0 


0 



Source: On-Line Search of ERIC database for years from 1966 to September 1997 (Entries as of February I99S). 
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An examination of observation and projective measures as a method of inquiry in educational 
research revealed striking results. Of the 14,370 studies using observation as a method of inquiry, only 14 
used “video recordings” as a technique. Also, only 28 studies using projective measures as a data collection 
procedure utilized “freehand drawings” or “children’s art.” As shown on Table 1, observational studies were 
frequently used as a complementary research method to questionnaires if the studies used more than one 
means of research inquiry. Nevertheless, of the three methods of interest (i.e., questionnaires, observation, 
and projective measures), none of the studies indexed in the ERIC appear to have utilized more than two of 
these methods of inquiry. 

Another surprising finding was that studies which utilized student drawings as a projective method of 
assessment limited their analysis of children drawings to clinical use such as studying child abuse (e.g., 
Bardos, 1993). There were no studies that utilized student drawings to explore the educational ecology of 
classrooms by examining drawings of teachers. Student drawings of the teacher at work may reveal 
important aspects of students’ attitude toward classroom learning and of teacher’s approach to instruction. 
Figure 1 shows one student’s depiction of a teacher at work in the classroom. 

Figure 1; A Student Portrait of ‘Teacher-Directed’ Classroom. 



Figure 2, drawn by another student, provides a different insight into teaching and learning in the 
classroom. This student portrays a student-oriented teaching and group-based learning environment. 

Figure 2; Student Portrait of ‘Group-Based’ and ‘Student-Oriented’ Instruction. 



The contrast between Figure 1 and Figure 2 shows how widely mode of instruction in the classroom, 
as perceived by students, can vary from one classroom to another. 
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In an effort to help schools promote reflection about instructional improvement, the purpose of the 
present study was to explore the possibilities of utilizing Campbell and Fiske’s (1959) Multi-Trait Multi- 
Method (MTMM) Approach in order to document construct validity of drawings as an assessment tool. The . 
present study suggests that schools should seek alternative perspectives on the essence of the educational 
endeavor based on the insights and perspectives of those who are the most assiduous observers of school and 
classroom life, namely students. Using the MTMM approach, this study primarily focused on the extent to 
which drawings of the classroom and classroom videos, relatively unusual methods of assessment, and 
questionnaires were valid indicators of classroom teaching and learning. The aspects of teaching and learning 
examined were two broad classroom traits, mode of instruction and variety of classroom learning materials 
used by teachers and students. 

Campbell & Fiske (1959) emphasize four aspects of validation process while using the MTMM 
approach: 

(1) Validation is typically convergent, a confirmation by independent 
measurement procedures. Independence of methods is a common 
denominator among the major types of validity; 

(2) For the Justification of novel trait measures, for the validation of test 
interpretation, or for the establishment of construct validity, discriminant 
validation as well as convergent validation is required: 

(3) Each test or task employed for measurement purposes is a trait method unit, 
a union of a particular trait content with measurement procedures not 
specific to that content; and 

(4) In order to examine discriminant validity, and in order to estimate the 
relative contributions of trait and method variance, more than one trait as 
well as more than one method must be employed in the validation process 
(Campbell & Fiske, 1959, p.81). 



Within this MTMM framework, construct validity was examined by assessing convergent 
validity and discriminant validity. The three methods of assessment constituted the base for examining 
convergent validity and two classroom traits provided data to investigate the discriminant validity. 

The contributions of the present study to the field of education are numerous. First, it 
documented the validity of drawings and videos as two unique assessment tools. Second, the present 
study attempted to provide information on the extent to which closed-ended surveys, student drawings, 
and classroom videos can be used as alternative assessment tools to gather information about classroom 
teaching and learning experiences. 

Finally, this study utilized student drawings as a method of inquiry in order to gain insight into 
classrooms. This was one of the highly unusual aspects of the present study. The 5-year experience of 
using drawings as an assessment tool at Boston College’s Center for the Study of Testing, Evaluation and 
Educational Policy (CSTEEP) demonstrated how drawings, as visual and graphical depictions of 
classroom experiences, tend to engage teachers in reflecting on their classroom teaching practices. 
Drawings also give an opportunity to students to freely express their classroom attitudes and learning 
experiences. Including the teacher’s perspective as the “key role player” in the classroom is another 
important aspect of this study that will provide insight into teaching and learning. Unlike assessments 
which use surveys or questionnaires that consider “what’s”, drawings and videotapes of classroom 
stimulate discussion considering “how’s”. 

A great body of educational research has been devoted to aspects of classroom experiences 
ranging from classroom atmosphere to instructional practices, from teacher satisfaction to classroom 
organization and management. Studies show that small group cooperative learning and interactive 
teaching produce promising results for students in earning higher grades, gaining more academic and 
social benefits, developing critical thinking, and being better decision makers (Ridling, 1994; Pardo & 
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Raphael, 1991). Studies of classroom social climate yield very similar results to those of examining 
classroom organization and management. Classroom atmosphere positively impacts achievement (Gulek, 
1994), self-concept (Houser, 1991; Wright & Cowen, 1985), students’ on-task behaviors (Short & Short, 
1988), and self-perceptions of teacher competence (Evangelau, 1991). Brulle, Brulle & Smith (1994) 
found that the position of the teacher in the classroom (closer to students as opposed to lecturing from a 
distance at the blackboard), accompanied by positive verbal comments, tends to create a positive learning 
environment. Furthermore, group-oriented teaching and student-centered learning usually led to positive 
classroom environments (Ridling, 1994; Pardo & Raphael, 1991) that in turn enhanced student learning 
and achievement (Gulek, 1994; Short & Short, 1988). These findings provided substantial evidence that 
classroom characteristics were related. In other words, there was evidence of convergent validity among 
the classroom characteristics cited above. However, because the present study examined not only the 
convergent validity but also the discriminant validity of drawings and videos, it was essential to study 
classroom traits/characteristics that were orthogonal. Orthogonality here referred to unrelated sets of 
classroom traits. For example, since there seemed to be a positive correlation between student-centered 
group-based learning and positive classroom environment, type of instructional and classroom 
atmosphere were not orthogonal and could not be studied to demonstrate discriminant validity. The 
assumption here was that the type of learning materials used in the classroom does not depend on the 
kind of instructional technique employed. In order to satisfy this condition, this study examined the mode 
of instruction and the variety of learning materials used in the classroom as two classroom traits. Fach of 
these classroom traits was measured by a number of variables and assessed through three different modes 
of inquiry. Mode of instruction and variety of learning materials used in the classroom as two classroom 
traits, and the three modes of inquiry are discussed in detail in the following sections. 

Mode of instruction is defined, in this study, as the extent to which the instruction in the 
classroom is student-centered versus teacher-directed. Fight closed-ended survey questions were used to 
measure the extent to which the mode of instruction, as perceived by students, is student-centered or 
teacher-directed. Higher scores in the closed-ended questions indicate a more student-centered mode of 
instruction. Aspects of highly student-centered classes as revealed in drawings and videos included 
classroom portraits of teacher located with or close to students, students in groups or doing projects, 
student desks clustered, or teacher talk praising or inviting students to discussion. The characteristics of 
student-centered mode of instruction included a classroom structure similar to a cooperative learning 
process. In this mode, the student was the dominant figure in the learning process. The interaction 
involved teachers questioning or praising the students in their class. However, teacher-directed mode of 
instruction was defined as teacher lecturing in the class, and the interaction between students and the 
teacher was limited to the teacher as a dominant figure such as the teacher who spends most of his/her 
time assigning homework or disciplining students. 

Learning materials was the other classroom trait examined in this study. This refers to the variety 
of materials that surround students in the class and are used to stimulate and enhance learning. These 
materials included, but were not limited to, calculators, maps/globe, projects, charts, overhead projectors, 
computers, and textbooks/worksheets, and blackboard. 

This study utilized three different methods of assessment; closed-ended sun'ey questions, student 
drawings, and classroom videos. Fight closed-ended survey questions aimed at assessing the mode of 
instruction, and 8 closed-ended questions assessing the learning materials used in the classroom. These 
16 closed-ended questions were derived from nationally and internationally administered surveys and 
were tailored to meet the needs of this study. The drawing exercise and a 45-minute classroom video 
were the two other modes of inquiry that were used to assess these two classroom traits-mode of 
instruction and variety of learning materials used in the classroom. 
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Background 



As Golomb notes, “For over a hundred years, the drawings of children have enchanted a rather 
diverse audience of psychologists, educators, art historians and artists” (Golomb, 1992, p.l). Indeed, 

Harris (1997) also stated that “Human figure drawings are often seen as an important part of 
psychological evaluations and have been used for over 100 years to describe and assess human 
behavior.” Haney et. al. (1997) further note that as early as 1885, Ebenezer Cooke published an article on 
children’s drawings in which he described the successive stages of development as he had observed in 
them. 

Soon after Goodenough published her book. Measurement of Intelligence by Drawings in 1926, 
psychologists made extensive use of children’s drawings in their clinical practice and psychological 
research. As Haney et. al. (1997) suggest, clinical use of drawings has limited itself to drawings by 
children mostly between the ages 4 and 12. Only a handful of studies investigated drawings by older 
subjects. Lubin, Larsen, & Matarazzo (1984) have found that Goodenough’s Draw-A-Man test was one 
of the top 10 most widely used psychological tests in clinical practice (House- Tree-Person drawing test 
was also among the top ten tests used). Ironically, the development of Draw-A-Man test was based on 
samples of children aged 4 to 10 years. Haney et. al. (1997) also point out that despite the century-old 
tradition of using children’s drawings in psychological research, very little educational research, other 
than that focused on art education, has employed drawings (p.ll).” 

Despite the fact that drawings are widely used in clinical practice and school psychology, there 
has been a long tradition of debate over the reliability and the validity of drawings as a measure of 
personality and social behavior and as devices to assess intelligence. Recently, the School Psychology 
Quarterly (1993, Vol.8, No.3) devoted an entire issue to the controversy over the validity of drawings 
used in clinical and educational practice. Of the six articles published in this issue, the debate centered 
around whether one can make valid inferences from human figure drawings [Draw-A-Man Test, and 
Kinetic Family Drawings test are two examples of this kind]. 

The work with drawings in the present study deviates from the past use of drawings in two major 
respects. First, the present study did not try to make inferences about individual students from drawings, 
but instead aimed at documenting patterns across classrooms. To elaborate, the present study was not an 
attempt to draw conclusions about individual students based on the drawings but instead to document the 
educational ecology of classrooms. Second, the present study did not seek to psychologically interpret 
results of drawings for classrooms. 

Unlike many standardized assessments particularly involving specific content areas such as math 
or science, drawings spur an extensive discussion among teachers about their teaching strategies. 
Compared to the discussion of “usual” assessment results —standardized tests— in schools, teachers tend 
to engage more in student perceptions of teaching and learning. Haney et. al.(1998), for example, 
reflected on their experiences in sharing assessment results with teachers in several content areas (e.g., 
math and science) and results of student drawings as follows: 

When teachers discussed students’ test results, conversation was typically brief and 
tended to focus on whether more or less of various subject matters should be taught. But 
when stimulated by drawings, discussions tended to be more extended and to focus on 
not just what was taught but also on how it was taught (Haney et. al., 1998, p.39). 

Boston College’s Center for the Study of Testing, Evaluation and Educational Policy, as a strong 
advocate of utilizing multiple modes of assessment, has faced two criticisms in using student drawings to 
. spur discussion of the educational ecology of classrooms. First, it is unclear whether students’ drawings ^ 
represent accurate pictures of teaching they have experienced or whether instead they represent students 
stereotypes and caricatures of teachers. Drawings clearly are influenced by people s stereotypes and their 
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understanding of symbolic representations of classrooms and teaching. However, even if drawings do 
represent students’ stereotypes about teaching, it is still important to recognize them. As Steele (1997) 
notes, student learning is affected not just by how they are taught, but also by the expectations and 
stereotypes surrounding their education and learning. Second, in CSTEEP s long tradition of using 
drawings in comprehensive school assessments, teachers have always raised concerns about whether or 
not students have taken drawing exercises seriously. As Haney et. al.(1998) suggest, the one thing that is 
clear about drawings is they help adults think seriously about children s perspectives. 

The video-recording method of assessment has become popular and is receiving increased 
attention due to its capability to reveal insights into the classroom from a different perspective. Two 
immediate examples of video as an assessment tool to aid instructional practices for teachers are the 
works of Carlin (1996) and Rowley & Hart (1996). Carlin identifies three apparent features of using 
video to examine learning and to guide instruction: (1) it provides a tangible audiovisual record of the 
learning process and outcomes; it can be replayed any number of times, allowing students to observe 
their own actions; (2) it provides a basis for a longitudinal assessment that can be made by repeating the 
video-recording on several occasions during the learning process; and (3) it provides a broad-based 
record of student-teacher, and student-student interaction during the learning process. Rowley & Hart 
focus their discussion on using video as a way of assessing teachers in reflecting on their teaching 
practices. As they note “video case studies that portray realistic classroom situations give pre-service, 
novice, and veteran teachers an opportunity to share experiences and reflect together on best practice” 
(p.28). Classroom videos also provide teachers an opportunity to critically review their teaching methods 
and students to observe their own actions. 

Another example of the use of videos in large-scale assessments came from the Third 
International Mathematics and Science Study. The TIMSS video study attempted to explore teaching and 
learning patterns in three nations (Germany, Japan, and USA). As Stigler & Hiebert (1997) note, a 
videotape study of the classroom instruction allows us to refocus on teaching processes, with the aim of 
improving students’ learning. The videos enabled them to seek answers to several questions such as. (1) 
What kind of mathematics do students encounter? (2) Are mathematical concepts and procedures 
developed? (3) What are students expected to do? (4) What is the teacher’s role? (5) How are the lessons 

organized? And (6) How do teachers view reform? 

Also, the National Board for Professional Teaching Standards (NBPTS) mandates all teachers to 
provide samples of their video-taped teaching in order to be certified. Obviously, video is an increasingly 
valued and frequently used means of assessment to examine teaching and learning practices in 
classrooms. Hence, the present study made use of classroom videos and drawings along with survey 
questionnaires as multiple methods of assessment in order to gain an insight into teaching and learning in 
upper elementary classrooms. 



Methodology 

The present study was carried out in two phases: Pilot Study Phase and Main Study Phase. The 
purpose of the pilot study was to experiment with drawing prompts and procedures for videotaping that 
would constitute the foundation for the main study. The pilot study aimed at assisting the principal 
investigator to determine: (1) if drawing prompts and video-recording would provide the kind of 
information sought; (2) if the procedures such as timing, instructions, content and the difficulty level of 
the survey questions and drawing prompts were clear and precise; and (3) whether scoring guidelines for 
drawings and videos were potentially feasible to develop. 
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Participants 



Twenty teachers and their students in grades 3 to 6 constituted the final sample of the study. 
Students and teachers in the final sample represented three different school systems. The participants 
were identified on a voluntary basis and were assured full confidentiality. Table 2 below shows 
distribution of participating students by gender and grade level. 



Table 2. Distribution of Sample by Gender and Grade Level. 



Grade Level 


Number of 
Classrooms 


Gender 

Female (%) 


Male (%) 


N 




3 


8 


52 


48 


128 




4 


3 


48 


52 


52 




5 


3 


62 


38 


58 




6 


6 


50 


50 


113 




Total Sample 


20 


48 


52 


351 





Table 2 indicates that grades 3 and 6 are over-represented in the sample. Gender was equally 
distributed among grades 3, 4 and 6. However, for the fifth grade students, females participation slightly 
exceeded males (62% versus 38%). Among these 20 classrooms, 8 classrooms were third grade, 3 
classrooms were fourth grade, 2 classrooms were fifth grade, 5 classrooms were sixth grade, and 2 
classrooms contained integrated fifth and sixth grade students. 

Instruments/ Methods of Inquiry 

The present study employed three different methods of inquiry; surveys, drawings and videos. 
These three methods of inquiry aimed at assessing the mode of instruction and variety of classroom 
learning materials in order to examine the convergent validity and discriminant validity of drawings. 
After the classroom participation for the present study was confirmed, the principal investigator 
informed teachers about the procedures concerning assessments. 

Video recording took place first. Each teacher was required to submit a 30- to 45-minute video 
recording of them teaching in class. All teachers complied with this requirement. In fact, some teachers 
went beyond the requested time limit and submitted classroom videos that exceeded 90 minutes. 
Immediately following the video-recording, students completed the Reflection Survey containing closed- 
ended questions and the drawing exercise. Average time to complete the Student Reflection Survey was 
approximately 25 minutes. Each classroom completed all assessments in one day. The principal 
investigator tried to minimize potential disruption of classroom activities during the assessments. For 
example, the principal investigator set up the video camera in the classroom before the school began or 
during the break. Also, he was purposefully not present in the classroom during video recording. 
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Closed-ended Survey Questions 



The closed-ended survey contained 16 Likert-type statements describing the mode of instruction 
and the variety of classroom learning materials with response options ranging from 4 (almost always) to 
1 (never). For the mode of instruction, students indicated their degree of involvement with a set of 
classroom activities. Higher scores indicated the higher level of involvement with these classroom 
activities. For the variety of learning materials, however, they reported the use of certain classroom 
materials as presented in the survey. Higher scores indicated greater variety of learning materials used in 
the classroom. Among the 16 closed-ended part of the student reflection survey, 1 1 questions were taken 
from Population 2 (Grade 7-8) Student Questionnaire of the Third International Math and Science Study 
(TIMSS; Beaton, et. al., 1998), 1 question was adopted from the Student Reflection Survey of the 
Cooperative Networked Educational Community of Tomorrow (Co-NECT, 1994) Project, and 4 
questions were developed by the principal investigator. Because most of the questions were used in 
nationally and internationally administered surveys, the reliability and validity of these questions have 
been carefully studied in a large-scale student populations. The internal consistency of the survey 
instrument is also reported for the present study. 

Drawings 

Drawings of teaching and learning in the classroom provide insight into student perceptions of 
mode of instruction and the variety of learning materials used in the classroom. For the purpose of the 
present study, the primary investigator originally developed two drawing prompts that were to be used in 
assessing the mode of instruction and variety of learning materials used in the classroom. As explained 
previously, based on the results obtained from the pilot phase of the present study, the following drawing 
prompt was used to elicit student perspectives about teaching and learning. 

“ Think about the teachers and the kinds of things you have done in your class today. Draw a picture of 
your teacher teaching and yourself learning. ” 

In scoring of drawings, a rating from 1 to 4 was assigned to each drawing for each of the two 
classroom constructs/traits. These ratings were averaged out for the classroom and two separate 
composite scores were computed for each classroom. The data analysis utilized the classroom as a unit of 
analysis. 

Video Recording 

The final means of inquiry that was employed in the present study was video taping of 
classrooms. The video-taping took place prior to the administration of closed-ended survey questions and 
drawing exercises. Teachers were requested to video-record 30-45 minutes of their teaching time in the 
classroom. Both teacher and student presence were required during the video recording. Video recording, 
administration of closed-ended survey questions and drawing exercises were done with the same group 
students and the teacher. The principal investigator set up the video camera on the day of the assessment. 
The video camera was stabilized on a designated spot in the classroom. On some occasions simultaneous 
assessments were not possible due to the limited number of cameras that were available for the 
assessment. In such situations, reuse of a video camera on a single day was done by setting it up during 
the break for minimal interference with classroom instruction. 

The pilot study attempted to identify possible set-up spots for the video camera in the classroom. 
Classroom set-ups determined possible placement spots for the video camera. However, experimenting 
with different angles in the pilot study provided valuable information for the main study. For example. 
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video recording experiences during the pilot study suggested the principal investigator to use wide-angle 
camera for classrooms with relatively small space. The pilot study also assisted the principal investigator 
to consider some criteria for the Video recording. In order to obtain optimum recording for scoring 
classroom videos, (1) at least 75% of the classroom had to captured in the recording, (2) diagonal 
placement of the camera in either rear comer of the classroom usually constituted the best spot for 
recording, and (3) at least 50% of students had to appear in the recording. 

The scoring procedures in videos for mode of instruction trait were different from the scoring 
procedures for variety of learning materials. For the mode of instruction, raters assigned scores from 1 to 
4 to each video segment of the classroom they viewed. Then, an average score for this particular trait was 
calculated for each classroom. For the learning materials, however, raters listed all of the learning 
materials students and teachers actively used in classroom during video-recording. The score for the 
variety of learning materials computed by averaging out the number of different materials raters listed. 

Due to time constraints in scoring videos, raters did not preview and score the video recordings 
in full-length. Instead, raters scored 5 randomly selected one-minute episodes in videotapes for each 
classroom. Therefore, each classroom video initially received 5 different ratings for the mode of 
instmction. These five ratings were then averaged out to constitute a single classroom rating for the 
mode of instruction. For the variety of learning materials, raters independently listed all of the learning 
materials the teacher or students utilized in the videos as they saw in 5 one-minute video segments. The 
number of different learning materials listed in all five one-segment video episodes constituted the 
variety of learning materials score. The scores from raters were then averaged out to assign a single 
classroom score for this particular trait. 

The procedure of randomly selecting one-minute time slots to score videos is called ''time 
sampling.” Because the selected time slots are random, representation of the classroom activities can be 
effectively addressed based on this procedure. Time sampling encompassed the following steps: 

1. Total time of video recording for each classroom was computed and recorded in minutes. For 
example, one classroom had a video recording of 42 minutes. 

2. Because five one-minute time slots selected from each classroom video, total video recording 
time was divided by 5. That is, 42:5 = 8.4. 

3. The result was rounded to nearest integer, i.e., 8 in this example. 

4. A computer-generated random number (from 0 to 9) was assigned to identify the first one- 
minute-long time slot. As an example, assume the random number was 4. Five one-minute 
random samples for a class would have been 4ll^, 12^*^, 201^^, 28^^^, and 36l'^ minute. 

Desien 

The present study was mainly a correlational research design. The purpose of the study was to 
explore the extent to which three methods of assessment (i.e., survey, drawings, and video) correspond m 
assessing two classroom traits (i.e., mode of instruction or learning materials). The present study also 
examined the extent to which the two classroom traits differ from each other as independently assessed 
by surveys, drawings, or'classroom videos. In other words, this study attempted to establish construct 
validity-convergent and discriminant- of drawings and videos as alternative research methods via the 
Multi-Trait Multi-Method approach. 

The present study incorporated two dimensions that were constructed to fit the Multi-Trait 
Multi-Method (MTMM) approach. These two dimensions were Means of Inquiry dimension and 
Classroom Traits dimension. The means of inquiry dimension included the types of research methods 
that were employed in the present study. The three means of inquiry that were utilized in this study were 
close-ended surveys, student drawings, and classroom videos. The classroom traits dimension included 
the mode of instruction and the variety of learning materials used in the classroom. The main design of 
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the study incorporated the Multi-Trait Multi-Method Approach to integrate two classroom traits and 
three methods of inquiry. 



Results 

As indicated previously, the classroom was the unit of analysis in all analyses. In order to 
examine the variation in data, descriptive statistics were conducted for closed-ended surveys, student 
drawings and classroom videos. Table 3 shows mean and standard deviation scores for mode of 
instruction trait and variety of learning materials traits as assessed by surveys, drawings and videos at the 
classroom level. 



Table 3. Mean and Standard Deviation Scores for Mode of Instruction and Learning Materials by the 
Method of Assessment at the Classroom-Level. 



Assessment Mode 




Mode of Instruction 


Learning Materials 


N 


Mean 


S.D. 


Mean 


S.D. 


Surveys 


20 


18.37 


1.53 


4.65 


1.21 


Drawings 


20 


2.49 


.41 


5.20 


1.94 


Videos 


19 


3.11 


.48 


3.47 


.98 



The mean score for mode of instruction in surveys was 18.37 with a standard deviation score of 
1.53 (see also Table 3). The minimum and maximum scores were 15.07 and 20.95, respectively. The 
range of scores that could be possibly obtained from the survey were beuveen 8-32. Thus, classroom 
level data from surveys did not show a great variation among 20 classrooms. The mode of instruction 
trait in drawings, however, yielded a more diverse distribution. The mean score was 2.49 and the 
standard deviation score was .41. Minimum and maximum scores were 1.90 and 3.22 respectively. The 
minimum possible score that could be assigned to mode of instruction in drawings was 1.00 and the 
maximum score was 4.00. The data seem to have a pretty “normal” distribution for mode of instruction 
as assessed by drawings. Mode of instruction in videos showed a slightly skewed distribution toward the 
higher end in data. The mean score was 3.11 with a standard deviation score of .48. The range of scores 
as assigned by three raters was between 2.07 and 3.87. 

As also shown in Table 3, mean scores for the variety of learning materials in surv'eys, drawings 
and videos were 4.65, 5.20 and 3.47, respectively. Standard deviation scores were 1.21 for surveys, 1.94 
for drawings and .98 for videos. The range of scores for variety of learning materials trait was between 
1.33 and 6.80 for surveys. Whereas classroom videos had the lowest range of scores for the variety of 
learning materials trait (between 1.67 and 5.00), drawings offered the highest range (between 2.33 and 
9.67). It is important to note that the data showed a slightly skewed distribution [toward the higher end] 
in surveys and videos for variety of learning materials trait. However, for the same trait, drawings 
showed a much less skewed distribution. 
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The reliability analyses included Cronbach alpha, Spearman-Brown Equal Length and Guttman 
split half procedures. These reliability procedures for closed-ended survey data were applied to student 
level and classroom level data. Table 4 shows individual as well as the classroom level reliability 
analysis results for the survey instrument used in this study. 



Table 4. Student Reflection Survey Reliability Coefficients for Individual and Classroom Level Data Analyses. 



Reliability Procedure 


Level of Analysis 


Individual/Student 


Group/Classroom 


Correlation Between Forms 


.48 


.72 


Cronbach Alpha 


;4i 


.63 


Guttman Split-Half 


.64 


.84 


Spearman-Brown Equal-Length 


.65 


.84 


Spearman-Brown Unequal-Length 


.65 


.84 


N (Sample Size) 


336 


20 



Table 4 shows that classroom-level analysis revealed much higher reliability coefficients than 
. that of student-level analysis. The data support analyses be carried out at the classroom level. 

Student Drawings 

Development of Scoring Guidelines for Student Drawings 

Several graduate students at Boston College were invited to participate in this study as judges in 
developing and later applying initial scoring guidelines for drawings. Seven judges were interested in 
taking part in the study. The judges included both masters and doctoral students and represented a variety 
of programs within Boston College’s School of Education. Initially, the principal investigator drafted 
scoring rubrics followed by individual consultation and reconciliation with each of the seven judges. 
Modified scoring rubrics included explanations for both classroom traits (mode of instruction and variety 
of learning materials) along with three samples of student drawings to clarify each rating. The seven 
judges were then presented with a sample of 40 drawings to assign scores for mode of instruction and 
variety of learning materials, based on the modified scoring guidelines. The results were highly 
satisfactory. The average exact agreement was 71% and average agreement among the seven judges 
within one-point was 94%. The judges reached 100% consensus on average agreement within two- 
points. Pearson coefficients of correlation among the raters were also computed. The minimum and 
maximum correlation coefficients were .57 and .89 respectively, with a median correlation of .75. The 
results in the pilot scoring phase indicated that the scoring guidelines were sound and could now be used 
in the main study. 
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Rater Training for Scoring Student Drawings 



Two new raters were invited to take part in this part of the study, One rater was a master’s 
student in Counseling Psychology program at Boston College and the other one was doctoral student in 
Educational Research, Measurement and Evaluation Program at Boston College. The principal 
investigator provided the raters with the scoring guidelines, separate for both traits— mode of instruction 
and learning materials, Each rater underwent a one-hour training session. The session included an 
introduction to the purpose of this study, explanation of scoring guidelines, and discussion of several 
drawing examples in conjunction with the scoring rubrics. These two raters were then asked to 
independently score a sample of 40 drawings. These drawings were the same set of drawings that were 
used to assess the consistency among initial panel of seven Judges who participated in the development 
of scoring rubrics. After the coding of 40 drawings the two raters and the principal investigator conferred 
with each other to resolve issues, if any, related to scoring. Issues that came out during the conference 
were all related to the clarity of lines in drawings as raters thought were hard to Judge and not related to 
the scoring guidelines. The raters reported that they felt comfortable in using rubrics. In addition to the 
principal investigator, the two raters further participated in scoring all of the student drawings collected 
for the purpose of this study. 

Assessment of Rater Consistency in Scoring Student Drawings 

Consistency among the three raters was assessed via inter-rater reliability procedures. In the 
summer of 1998, three raters scored drawings from 351 students representing grades 3-6 in 20 
classrooms. The procedure was conducted for mode of instruction and variety of learning materials traits 
separately. Table Sbelow presents coefficients of correlation among three raters for mode of instruction 
trait where the data were analyzed at the classroom level. 

Table 5. Inter-Rater Reliability Correlation Matrix of Mode of Instruction in Scoring Student Drawings at 
the Classroom Level. 



Raters 


Raters 


(1) 


(2) 


(3) 


(1) 


1.00 


.90** 


.80** 


(2) 




1.00 


.79** 


(3) 






1.00 



Note. N=20 Classrooms: 



As shown in Table 5, the findings from classroom level analysis indicate a high level of 
consistency among the three raters. The coefficients of correlations for the mode of instruction trait were 
.90, .80, and .79 between Rater 1 and Rater 2, between Rater 1 and Rater 3, and between Rater 2 and 
Rater 3, respectively. All of the correlation coefficients were statistically significant at the .01 level. To 
summarize, raters scored mode of instruction trait in student drawings with a high degree of consistency. 
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Similar procedures were also applied to variety of learning materials trait in order to assess the inter-rater 
reliability among the same three raters. Again, the data were treated at the classroom level. Table 6 
presents the results from this analysis. 

Table 6. Inter-Rater Reliability Correlation Matrix of Variety of Learning Materials in Scoring Student 
Drawings at the Classroom Level. 



Raters 


Raters 


(1) 


(2) 


(3) 


(1) 


1.00 


.81** 


.61** 


(2) 




1.00 


.S3** 


(3) 






1.00 


Note. N= 20 Classrooms: 









Careful inspection of Table 6 indicates that although not as high as the correlation coefficients that 
were obtained from the analysis of mode of instruction trait, the findings are highly satisfactory. While the 
lowest correlation was observed in results between Rater 1 and Rater 3 (r=.67) in mode of instruction trait, 
the correlation was highest between Rater 2 and Rater 3 (.83). Similar to the results observed in the inter- 
rater reliability coefficients for mode of instruction, all of the correlation coefficients for variety of 
learning materials were statistically significant at the .01 level (see also Table 6). 

Classroom Videos 



Development of Scoring Guidelines for Classroom Videos 

The procedure that was followed to develop scoring guidelines for classroom videos was similar 
to the procedure used to develop scoring rubrics for drawings. First, classroom video samples were 
previewed in order to solicit aspects of classroom teaching and learning to be considered for scoring. A 
list of classroom characteristics was obtained the principal investigator. This list was then consolidated 
with the scoring rubrics that were developed for drawings. This list containing aspects of the classroom 
as seen in videos were then conferred with a panel of four judges two of whom were to participate in 
scoring classroom videos. The scoring rubrics for videos were modified in accordance with the 
suggestions offered by the committee of judges. The purpose of this procedure was to align scoring 
guidelines for with scoring guidelines for drawings as parallel as possible. 

Rater Training for Scoring Classroom Videos 

Following the finalization of scoring rubrics for videos, the principal investigator offered a one- 
hour training session for the two raters. This session covered introduction and the purpose of the study, 
description and explanation of scoring guidelines with examples of one-minute video episodes, pilot 
scoring of 21 one-minute video segments and conferencing on issues that might have been related to 
scoring guidelines. During this training session, all raters independently scored a sample of 21 one- 
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minute video episodes in order to assess the applicability of newly developed scoring guidelines to code 
classroom videos. The results of the sample scoring indicated a high level of consistency among the three 
raters (about 61% exact agreement on the average and 100% agreement within one-point). Pearson 
correlations were also substantially high for this pilot scoring. They were .88 (between Rater 1 and Rater 
2), .78 (between Rater 1 and Rater 3), and .72 (between Rater 2 and Rater 3) all of which were also 
statistically significant (p<.01). The raters also reported high level of confidence in applying scoring 
rubrics to score classroom videos. 

Assessment of Rater Consistency in Scoring Classroom Videos 

In the fall of 1998, the principal investigator conducted the quality control and data editing from 
classroom videos for the main study. The editing process involved time sampling of video segments 
received from 19 classroom teachers who have conducted the assessments for this study in late spring 
and early summer of 1998. The time sampling procedure yielded a random selection of 5 one-minute 
video segments for each participating teacher. This random time sampling procedure was explained in 
detail in the previous chapter. After editing, 95 (5 one-minute segments from each teacher in 19 
classrooms) one-minute classroom teaching episodes were submitted to raters for scoring purposes. They 
scored each one-minute video segment separately. These scored 95 one-minute video segments 
constituted the classroom-level data for the main study. Raters independently coded each of the one- 
minute video segments using the scoring rubrics developed for classroom videos. For each segment to be 
coded, raters assigned one score for the mode of instruction trait and one score for the variety of learning 
materials trait. Then, five segment codes were averaged out. The consistency among raters was assessed 
by inter-rater reliability procedure. Table 7 shows the coefficients of correlation in terms of consistency 
among raters for the mode of instruction trait, as assessed by classroom videos. 

Table 7. Inter-rater Reliability Correlation Matrix of Mode of Instruction in Scoring Videos at the Classroom 
Level. 



Raters 


Raters 


(1) 


(2) 


(3) 


(1) 


1.00 


.78** 


,73** 


(2) 




1.00 


,73** 


(3) 






1,00 



Note. N-19 Classroom Videos Scored; p<.0L 



Table 7 shows that Pearson correlations concerning the mode of instruction trait among raters 
were substantially high; all were above .70. These correlation coefficients were also statistically 
significant (p<.01). To highlight, raters were significantly consistent in coding mode of instruction trait 
at the classroom level, as seen in classroom videos of teaching and learning. The variety of learning 
• materials used in the classroom trait was also subjected to the assessment of rater consistency. Table 8 
below shows inter-rater correlation matrix of variety of learning materials in scoring classroom videos. 



O 

ERIC 
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Table 8. Inter-Rater Reliability Correlation Matrix of Variety of Learning Materials in Scoring Videos at the 
Classroom Level. 



Judges 


Judges 


(1) 


(2) 


(3) 


(1) 


1.00 


.50* 


.47* 


(2) 




1.00 


.48* 


(3) 






1.00 



Note. N=I9 Classroom Videos Scored; * p<.05. 



Compared to the coefficients of correlation among raters for the mode of instruction trait, the 
inter-rater correlation coefficients for the variety of learning materials trait seemed to be low (see also 
Table 8), However, these coefficients of correlation were still substantially high and hence were 
statistically significant (p<.05). In other words, the consistency among the raters in scoring variety of 
learning materials as seen in classroom videos was satisfactory. 

The validity of these multiple assessment modes assessing multiple traits was examined by using 
the Multi-Trait Multi-Method (MTMM) approach. This approach requires that there must be at least two 
methods used to assess at least two traits under investigation. This study utilized three methods (surveys, 
drawings and videos) and two classroom traits (mode of instruction and variety of learning materials 
used in the classroom). The MTMM approach also indicates that in order to show evidence of validity, 
both convergent validity and discriminant validity need to be demonstrated. 

Conver2ent Validity 

Coefficients of correlation were used to examine if there was evidence for convergent validity. 
Two sets of analyses were separately conducted for each of the two classroom traits. Findings are 
presented for mode of instruction trait first followed by variety of learning materials trait. Table 9 shows 
coefficients of correlation by assessment mode for the mode of instruction trait. 

Table 9: Correlations by Assessment Method for the Mode of Instruction Trait at the Classroom Level. 



Assessment Mode 


Surveys 


Drawings 


Videos 


Surveys 


1.00 


.52* 


.27 


Drawings 




1.00 


.58** 


Videos 






1.00 



Note: ** p<.0l; *./?<. 05 
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When the data were analyzed for mode of instruction trait at the classroom level, the highest 
correlation (r = .58) was observed between student drawings and classroom videos assessment modes, 
followed by correlation between surveys and drawings (r = .52; see also Table 9). The coefficient of 
correlation between drawings and videos was statistically significant at the .01 level. The coefficient of 
correlation between surveys and drawings was also statistically significant at the .05 level. The lowest 
coefficient of correlation (r = .27) was obtained from the classroom level analysis of mode of instruction 
between closed-ended surveys and classroom videos. Thus, there is a strong evidence of convergent 
validity as indicated by substantially high correlation among the assessment modes, particularly between 
Student drawings and classroom videos. 

Similar to the analysis conducted for mode of instruction trait, data analysis for the variety of 
learning materials trait in examining the convergent validity involved computing coefficients of 
correlation across assessment modes for this trait at the classroom level. Table 10 shows results. 

Table 10: Coefficients of correlation by assessment mode for variety of learning materials trait at the 
classroom level. 



Assessment Mode 


Surveys 


Drawings 


Videos 


Surveys 


1.00 


,25 


-.29 


Drawings 




1,00 


.40 


Videos 






1.00 



The correlation run for the variety of learning materials trait showed mixed results. The 
coefficients of correlation were .25 between surveys and drawings, -.29 between surveys and videos, and 
.40 between drawings and videos (see also Table 10). While the former two coefficients of correlation 
were relatively low, the correlation between drawings and videos was substantial This was very similar 
to the coefficient observed in mode of instruction trait. Although it is small in magnitude, this trait 
provided somewhat supporting evidence to convergent validity. 

Discriminant Validity 

The second part of the validation process via the MTMM approach requires the demonstration of 
discriminant validity of the traits. This procedure involves the computation of correlation coefficients 
between the traits used for each assessment method. It is expected that there would be a low correlation 
between the traits. Because this study included three methods, three separate correlation coefficents were 
computed in order to examine the discriminant validity of these three assessments modes measuring 
mode of instruction and variety of learning materials. Table 1 1 presents the correlation coefficients 
between traits for each of the three assessment modes. 



O 
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Table 11: Coefficients of Correlation Between Mode of Instruction and Variety of Learning Materials by 
Assessment Modes at the Classroom Level. 



Surveys 



Drawings 



Videos 



Classroom Traits MI LM MI LM 



MI LM 



Surveys 

Mode of Instruction (MI) 1.00 .13 

Variety of Learning 

Materials (LM) 1 .00 



Drawings 



MI 


1.00 


-.16 


LM 




1.00 



Videos 



MI 



1.00 -.21 



LM 



1.00 



Note: N is 20 classrooms for Surveys and Drawings, and 19 classrooms for Videos. 



As shown in Table 1 1, the correlation coefficients between mode of instruction and variety of 
learning materials were. 13, -.16, and -.21 for closed-ended surveys, student drawings, and classroom 
videos, respectively. All of these correlation coefficients were low and statistically non-significant. That 
is to say, in line with the expectation of the study, the two classroom traits seemed to be independent of 
each other and were not related. These findings provide strong evidence for the demonstration of 
discriminant validity. 

In sum, the results concerning the convergent validity and the discriminant validity were highly 
favorable of the expectations of this study. In particular, drawings and videos as two assessment methods 
provided a strong evidence for construct validity within the MTMM framework. 
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Discussion 



The purpose of the present study was to examine the extent to which student drawings are valid 
indicators of the educational ecology of classrooms. Within this framework, Campbell and Fiske’s 
(1959) Multi-Trait Multi-Method approach was utilized to assess the construct validity of drawings as an 
assessment tool. Based on the MTMM approach, mode of instruction and variety of learning materials 
were the two classroom traits that were selected to examine in this study. Multiple methods included 
closed-ended survey questions, student drawings of teaching and learning, and classroom videos. The 
construct validity was assessed by examining convergent validity as well as discriminant validity, as 
suggested by the MTMM approach. Prior to the assessment of construct validity, all of the methods were 
subjected to an extensive reliability analyses. 

Reliability procedures for closed-ended survey questions encompassed Cronbach alpha, 
split-half reliability and test-retest reliability. Cronbach alpha internal consistency analysis 
indicated that, whereas mode of instruction subscale total score yielded extremely low coefficient 
of reliability (.06 both at the student- and classroom-level analyses), the variety of learning 
materials subscale produced moderately high (.56) consistency at the student-level data and 
substantially improved to a coefficient of .78 at the classroom-level analysis. Substantial 
improvement (from .41 to .63) in the internal consistency of the survey was also observed in the 
results for the total scale scores. Randomly splitting the items into two halves in the survey, the 
Spearman-Brown split half reliability procedure was also applied to closed-ended survey 
questions. Results indicated a high level of internal consistency of the measure. The coefficient 
of correlation between forms was .72 and the coefficient of Spearman-Brown equal length 
reliability was .84. These findings suggest that although the closed-ended survey instrument 
shows a high degree of reliability in many counts, the Cronbach alpha reliability procedure raises 
some questions about the internal consistency of the measure, particularly in the assessment of 
mode of instruction trait. Therefore, the results concerning the survey questions should be 
interpreted cautiously. 

Pearson correlations were computed in order to assess rater consistency in scoring student 
drawings. Results indicated that the three raters were significantly consistent in scoring student drawings, 
particularly in coding the mode of instruction trait. The coefficients of correlation among raters were .79, 
.80, and .90 for mode of instruction and .81, .67, and .83 for variety of learning materials and all of the 
coefficients of correlation were statistically significant at the .01 level. As also suggested by Fierros, 
Gulek & Wheelock (1996), results of this study indicate that multiple raters can score student drawings 
with a high degree of reliability. Thus, mode of instruction and variety of learning materials traits in 
student drawings provide a high level of confidence in the interpretation of related findings. 

Similar to drawings, Pearson correlations were also computed to assess the inter-rater reliability 
in coding classroom videos for mode of instruction and variety of learning materials trait. The 
coefficients of correlation among the three raters were .73, .73, and .78 all of which were statistically 
significant at the .01 level. For the variety of learning materials coefficients of correlation were relatively 
lower but statistically significant at the .05 level. They were .48, .47, and .50. Findings from the 
assessment of reliability in scoring videos suggest that mode of instruction and variety of learning 
materials have been reliably coded by raters provided with clear scoring guidelines and sufficient 
training. Results concerning the video assessment measuring mode of instruction and variety of learning 
materials can be confidently interpreted. 

The coefficients of correlation among the three assessment methods indicate that drawings and 
videos assessing the mode of instruction have a significantly high degree of correspondence (r = .58, 
p<.0l). The coefficients of correlation between drawings and surveys assessing the mode of instruction 
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were also significantly high (r = .52, p<.05). In assessing the mode of instruction, surveys and videos 
showed rather low correlation (r = .27, p=non-significant). In the examination of variety of learning 
materials trait, drawings and videos showed a moderately high but non-significant correlation (r = .40). 

For the same trait, coefficients of correlation were .25 between surveys and drawings, and -.29 between 
surveys and videos. These findings suggest that to a certain extent, there is evidence of convergent 
validity, particularly between drawings and videos as two unusual assessment methods. It is worth to 
note that the coefficient of correlation between videos and drawings was the highest for the mode of 
instruction trait as well as for the variety of learning materials trait. 

The discriminant validity was assessed by examining the coefficients of correlation between 
mode of instruction and variety of learning materials traits for each of the three assessment methods. In 
order to demonstrate discriminant validity, coefficients of correlation between the two traits were 
computed, and were expected to be low and non-significant. They were .13 for surveys, -.16 for 
drawings, and -.21 for videos. It is interesting to observe that all of these correlations examining 
discriminant validity were lower than the lowest coefficient of correlation in the examination of 
convergent validity. These findings provide a strong support for discriminant validity. 

In general, results concerning convergent validity as well as discriminant validity indicate that 
drawings have high correspondence with surveys and classroom videos in assessing teaching and 
learning in classrooms. Thus, drawings are valid indicators of the educational ecology of classrooms and 
have a high potential as an assessment tool in gaining an insight into classroom teaching and learning. 

While the results are generally in favor of the expectations of this study, there were also 
unexpected results. For example, although the inter-rater reliability was significant and relatively high 
for the variety of learning materials trait in scoring videos, the reliability was not as high as it was 
observed in scoring drawings. One possible source of this difference may be due to the difference in 
scoring rubrics. In drawings, raters were instructed to list any object in scoring sheet as learning material 
if the student included it in the drawing, whether or not students used it during instruction. In videos, 
however, raters were instructed to list any object as learning material only if it was actively used during 
instruction. The principal investigator was aware of this problem and acknowledged it as a limitation to 
this particular study. Possible solutions to this problem were discussed in the limitations section of this 
chapter. 

In another account, the examination of convergent validity for the variety of learning materials 
trait showed unexpectedly low and non-significant coefficients of correlation, particularly between 
surveys and drawings and also between surveys and videos. This particular problem may be due to the 
difference in scoring guidelines and/or it may be related to relatively lower levels of inter-rater 
reliability. Nonetheless, surveys seem to produce lower coefficients of correlation with other assessment 

methods. 

Implications for Education 

Recent literature on middle school reform has emphasized that the schools should be a place 
where teachers can engage in critical study of their own practices (Darling-Hammond, 1988; Glickman, 
1993; Sirotnik & Oakes, 1990). Student drawings, along with classroom videos and a tailored student 
reflection survey may provide a basis for practitioners at the school level to improve their own practice. 
Student perceptions of teaching and learning may balance the emphasis on outcome data for school 
accountability practices. In their recent article of using multiple modes of assessment of student learning 
and attitudes to promote instructional improvement, Haney, Pedulla &. Wheelock (1997) suggested four 
major points such as focusing on purpose, engaging people throughout, examining process as well as 
outcomes and employing multiple methods of assessment. Using multiple methods with multiple traits in 
assessing teaching and learning in classrooms, this study offers many prospects for school practitioners 
to examine and reflect on their teaching practices. 
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Focusing on classroom instructional practices, Schon (1991) argued that improved teaching 
comes only by reflecting on past practices. By critically examining their own practices in the context of 
schools’ conception of “good” teaching and learning, reflective teachers are better able to “devise 
alternative ways of teaching in order to rise to the many challenges inherent in today’s schools (Sack, 
1997). Videos have a long tradition of being a vehicle to train pre-service teachers and to build critical 
inquiry skills for reflective practitioners. Student drawings have as much potential to engage teachers in 
evoking teacher reflection about their instructional practices. Because of their nature, drawings provoke 
teachers’ curiosity, greater engagement, and reflecting on teaching and learning. According to Scherr 
(1993) these are the keys for a reflective practitioner. Indeed, student drawings of teacher at work in the 
classroom illuminate the educational ecology of classrooms from the perspectives of its most attentive 
observers, namely students. This information constitutes the foundation for the reflective practitioner, 
where empowered teacher shows reflection in action as a critical inquiry and empowered student 
displays critical thinking and learning for a successful school reform. 

As also concluded by Weber & Mitchell (1996) the way drawings were collected, analyzed, and 
interpreted in this study might prove useful in providing a way to evaluate, challenge, or reflect on these 
images of students. Inviting teachers to draw, or have their students draw, and then share their drawings, 
or to write or talk about them, provides an excellent forum for critical reflection. This brings to light the 
nuances and ambivalence in students' view of teachers that can inform our professional knowledge of 
teacher education. 

Just as different age groups or different genders or different ethnic groups may provide a variety 
of messages in drawings, individuals who are different in terms of self-expression may produce quite 
different accounts in their drawings; some may have a richer content and expression than others. As 
proposed by the Multiple Intelligence Theory (Gardner, 1983, 1993), intelligence may take a variety of 
different forms and uses a combination of the seven frames of intelligence (linguistic, logical- 
mathematical, personal sensitivity to others & sensitivity to oneself, spatial, musical, and bodily 
kinesthetic). To illustrate, some individuals may better express themselves in a multiple-choice type of 
assessment while others may be bodily kinesthetic and could best express themselves in front of a video 
camera. Yet, still others may be spatially talented. For such individuals there is no more powerful tool to 
assess their attitudes toward teaching and learning than using drawings. The present study offered a total 
assessment approach that is inclusive of closed-ended surveys, student drawings, and classroom videos 
to gain insight into teaching and learning. In doing so, students had the opportunity to express themselves 
through various forms of self-expression. Greeno and Hall (1997) also noted that “every student s 
educational activities should include the rich variety of experience and learning made possible through 
participation in multiple practices of representation” (p.361). It is hoped that using multiple methods of 
assessment in illuminating the .educational ecology of classrooms would provide a different model for 
teachers to better understand their students’ learning experiences and to reflect on their own teaching 
practices. As Snow (1997) pointed out “teacher facility in and use of the multiple symbol systems 
relevant to a particular subject matter, at a particular time, in a particular place, and with a particular 
student population may be the essential feature of adaptive teaching (p.354). 

Many public school systems operate with a limited budget. When it comes to the allocation of 
budget and resources, cost constitutes a real issue, particularly concerning assessment and evaluation. 
Because many states mandate school systems to administer a standardized test to be held “accountable”, 
sometimes cost becomes the only reason to favor a particular standardized test. Therefore, it is important 
for test developers to take cost-effectiveness into account when constructing a new assessment tool. 
Many pre-service teacher education institutions undeniably cite the value of video as a training and 
observation tool. Video has also played an important role in the Third International Math and Science 
Study to study cross-cultural differences in teaching and learning as one of the largest studies ever 
undertaken. Unlike in the TIMSS study, funding may constitute an issue for some of the pre-service 
teacher training programs. Since this study provided a strong evidence for the correspondence between 
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classroom videos and student drawings, drawings may offer a way to gain visual information about the 
attitudes of students about schooling and their learning experiences that is less costly than video 
recordings. When possible videos and drawings should be used as complementary tools to gain insight 
into student perspectives about teaching and learning. However, when schools deal with limited budget 
for assessments and teacher development, drawings may constitute the poor man s video-recorder for 
such schools. 

Children’s drawings of classrooms of this kind nonetheless urges school practitioners not to read 
much meaning into individual drawings but rather seek to document patterns across random samples of 
responses for the entire school (Haney et. al., 1998; Haney et. al., 1997). Because some unusual drawings 
may provide hints to point out a particular teacher or a.particular student, which may be unethically used 
against the person (Haney et. al., 1997). Drawings are personal expressions of artists’ inner worlds and must 
be respected as such. The storage and use of drawings must comply with the ethical standards of conducting 
research with human subjects. These ethical standards may include American Psychological Association’s 
(1985) Standards for Educational and Psychological Tests and Manuals, as well as the American Art 
Therapy Association’s (1999) the Code of Ethical Standards for Art Therapists. It is important to take art 
products, respect them as personal and confidential material and to give them the ethical respect they 
deserve. 



Limitations and Recommendations for Future Research 



The present study utilized a rigorous design. The arrangement of participating teachers required 
many site visits including meetings with the district superintendent, school principals, invitation of 
teachers and students to participate in the study, meetings with teachers, clarification of the scope of the 
study and follow-up letters to confirm the participation, dates of assessments, and participant 
acknowledgement. In spite of the precautions, this study suffered from several limitations. 

First of all, the closed-ended survey instrument yielded lower levels of internal consistency than 
was expected. It is not clear whether the low level of reliability was due to the collection of items from 
different sources, or due to inclusion of only three classrooms in the pilot assessment. Future research 
may involve an extensive internal consistency study with the inclusion of more classrooms to construct a 
more reliable survey questionnaire. 

The second limitation was the sample size. The nature of this study required classroom level 
analysis and thus the number of classrooms participating in study was critical. Due to limited time and 
resources, the principal investigator was able to recruit only 20 classrooms. Future replication of this 
study should aim to include at least 30 classrooms in order to make more meaningful statistical 
inferences. 

The third limitation was related to the scoring guidelines for the variety of learning materials 
trait. The scoring rubrics were slightly different for videos than that for drawings. For example, raters 
listed any object as a learning material only if this particular object had been actively used during the 
instruction in classroom videos. In drawings, however, raters listed any object as a learning material if it 
had been depicted in student drawings. This slight difference in scoring of learning materials trait may 
have risen concerns. For example, in drawings all objects portrayed by students were included in scoring, 
whereas in videos only objects that were actively used by students were included in scoring. Future 
research may include asking students to write a description of their drawings. Such description of 
drawings by the artist may resolve these disparities in scoring rubrics. Also, should the investigator 
decide to develop new scoring guidelines, this difference should be considered in the development of 
new scoring guidelines. Additionally, follow-up interviews with randomly selected students may help 
researchers understand if this difference in scoring guidelines constitute a source of error in results. 

The data collection of this study constituted a single-day one-time assessment for each 
participating classroom. This is not necessarily considered as a limitation for this particular study 
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because drawing prompt and video recording were inclusive of a particular time frame and yielded 
satisfactory results. However, longitudinal multiple assessments with a clear purpose, matrix sampling of 
students, engaging people throughout, evaluating process as well as outcomes, and setting school goals 
may provide a more comprehensive information about the school system. This helps schools track 
change over time and follow-up the correspondence among the purposes, the process and the goals in 
order to assess the impact of their school reform efforts. Multiple longitudinal assessments can also help 
school practitioners understand if subject matter, gender, or developmental stages of children play an 
important role in explaining the variation in drawings across classrooms. 

As suggested by Sack (1997) drawing self-portraits may become a useful activity for pre-service 
and in-service teacher workshops. Teachers could be asked to draw a picture of them selves in a 
classroom teaching situation and then discuss what these drawings reveal about their teaching activities 
and experiences. This simple procedure may yield a wealth of discussion about teacher roles and 
expectations, teaching practices, ideas about teaching styles, as well as educational philosophies. Due to 
its engaging nature, this procedure may also be used as an “ ice-breaker” activity in a teacher-training 
workshop. 

Lastly, it is recommended that drawings be used as an integral part of a comprehensive 
assessment system. Whereas standardized tests may reveal a great deal of information about what 
students already know about a particular subject matter, drawings may provide a window to students’ 
ways of learning. For example, Lifford’s (1998) study with children drawings yielded important findings 
about how students read, reflect and visualize. Likewise, Black (1991) illuminated invaluable 
information from student drawings about the development of writing process in elementary school kids. 
As suggested by Stufflebeam (1993), Stake (1993) and Guba and Lincoln (1993), assessing i'hQ processes 
of schooling and education as well as outcomes is an important part of curriculum development and 
program evaluation. 
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Guidelines to Score Drawings 



Section I: Mode of Instruction 

Score of 4 (Hiehly Student-Centered Mode of Instruction): 

• Smdent desks are clustered 

• Students are working in groups/pairs 

• Teacher talk, if any, invites discussion (e.g., praises, questions) 

• Active learning is apparent (i.e., students are engaged in an activity) 

• Teacher is with/nearby students. 



Sample Drawings: 
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Score of 3 (Moderately Student-Centered Mode of Instruction): 



• Student desks are usually clustered. If desks are in rows, active learning should be apparent (i.e., students are 
engaged in an activity) 

• Students are seated in groups/pairs 

• Teacher is at a distance (at blackboard or at teacher’s desk) 

• At least two people (two students or one student-one teacher) are included in the picture and there should be 
interaction (e,g., content-related talk, engaged in an activity collectively). If only one student is present, active 
learning should be apparent (i.e,, the student should clearly be engaged in an activity) 

Sample Drawings: 
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Score of 2 (Moderately Teacher-Directed Mode of Instruction): 



• Student desks are in rows. 

• Students are seated in rows. 

• If depicted, the teacher is at a distance (at blackboard or at teacher’s desk) and lecturing. If the teacher is not 
depicted, there should be at least one student present in the picture. 



Sample Drawings: 
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Score of 1 (Hi 2 hly Teacher-Directed Mode of Instruction): 



• Only the teacher depicted, students are not present in the picture. 

• If depicted, student desks are in rows. 

• The teacher is depicted at the blackboard, or at teacher’s desk. 

• Teacher talk, if any, is lecturing or disciplining. 



Sample Drawings: 
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Guidelines to Score Drawings 



Section II: Variety of Learning Materials 

Directions: Please list all of the learning materials used/depicted in student drawings. Learning materials may 

include, but are not limited to, computers, textbooks, worksheets, overhead projectors, calculators, 
posters, graphs, blackboard, maps/globe, experiment equipment, rulers, puzzles, and TVA^CR. Below 
are some examples. 

Sample Drawings and Learning Materials Portrayed: 
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Guidelines to Score Classroom Videos 



Section I: Mode of Instruction 

Score of 4 (Hishlv Student-Centered Mode of Instruction): 

• Student desks are clustered 

• Students are working in groups/pairs 

• Students frequently participate in class discussions 

• Teacher talk invites discussion (e.g., praises, questions) 

• Frequent interaction between students and teacher, and among students 

• Teacher frequently assists students 

• Active learning is apparent (i.e., students are engaged in an activity) 

• Teacher is with/nearby students 

Score of 3 (Moderately Student-Centered Mode of Instruction): 

• Student desks are usually clustered, although desks may be in rows, as an exception, but active learning should 
be apparent (i.e., students are engaged in an activity) 

• Students are working in groups/pairs 

• Students sometimes participate in class discussions 

• Teacher sometimes invites discussion (e.g., praises, questions) 

• Occasional interaction between students and teacher, and among students 

• Teacher occasionally assists students 

• Active learning is usually apparent (i.e., students are sometimes engaged in an activity) 

• Teacher is sometimes at a distance (e.g., blackboard, teacher’s desk) and sometimes he/she is with/nearby 
students 

Score of 2 (Moderately Teacher-Directed Mode of Instruction): 

• Student desks are in rows, although desks may be clustered, as an exception where students are not engaged in 
an activity 

• Students are working individually 

• Students rarely participate in class activities 

• Teacher lectures most of the time 

• Seldom interaction between students and teacher, and among students 

• Teacher rarely assists students 

• Passive learning is apparent (i.e., students are not engaged in an activity, they passively listen to the teacher) 

• Teacher is frequently at a distance (e.g., blackboard, at teacher’s desk) 

Score of 1 (Hi 2 hlv Teacher-Directed Mode of Instruction): 

• Student desks are in rows 

• Students are working individually 

• Students do not participate in class activities (i.e., they are quiet) 

• Teacher disciplines students (e.g., “sit down,” “shut up,” “be quiet”) 

• No interaction between students and teacher, and among students 

• Teacher does not assist students 

• Learning is not apparent (i.e., students are not engaged in an activity or they do not listen to the teacher) 

• Teacher is at a distance (e.g., blackboard, at teacher’s desk) 
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Guidelines to Score Classroom Videos 

Section II: Variety of Learning Materials 



Directions: You will be shown a series of one-minute video segments from different classrooms. Your task is to list 
all of the learning materials actively used in these one-minute classroom video segments in the scoring 
sheet provided to you. Learning materials, as they appear in video segments, may include, but are not 
limited to, computers, textbooks, worksheets, overhead projectors, calculators, posters, graphs, 
blackboard, maps/globe, experiment equipment, rulers, puzzles, and TVA/^CR. There is about 15 
seconds elapsed time between each one-minute classroom video segment. This elapsed time is reserved 
for you to score the previous one-minute video segment. Please feel free to rewind the tape if the time 
between episodes is insufficient. 
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